Finding Every Medical Terms by Life Science Dictionary for MedNLP
نویسندگان
چکیده
We have been developing an English-Japanese thesaurus of medical terms for 20 years. The thesaurus is compatible with MeSH (Medical Subject Headings developed by National Library of Medicine, USA) and contains approximately 30 thousand headings with 200 thousand synonyms (consisting of the names of anatomical concepts, biological organisms, chemical compounds, methods, disease and symptoms). In this study, we aimed to extract medical terms as many as possible from the test data by a simple longest-matching Perl script. After changing the given UTF-8 text to EUC format, the matching process required only 2 minutes including loading of a 10 MB dictionary into memory space with a desktop computer (Apple Mac Pro). From the 0.1 MB test document, 2,569 terms (including English spellings) were tagged and visualized in a color HTML format. Particularly focusing on the names of disease and symptoms, 893 terms were found with several mistakes and missings. However, this process has a limitation in assigning ambiguous abbreviations and misspelled words. The simple longest-matching strategy may be useful as a preprocessing of medical reports.
منابع مشابه
Finding Specific Medical Terms Using the Life Science Dictionary for MedNLP
We have been developing an English-Japanese thesaurus of medical terms for the past 20 years. The thesaurus is compatible with MeSH (Medical Subject Headings, developed by the National Library of Medicine, USA) and contains approximately 30,000 headings with 200,000 synonyms (consisting of the names of anatomical concepts, biological organisms, chemical compounds, methods, diseases and symptoms...
متن کاملNECLA at the Medical Natural Language Processing Pilot Task (MedNLP)
This paper gives an overview of NECLA’s submitted systems for the De-Identification and Complaint & Diagnosis subtasks of the Medical Natural Language Processing Pilot Task (MedNLP)[5]. Our systems combine features derived from Part of Speech (POS) tags, a domain-specific dictionary, the Unified Medical Language System (UMLS) metathesaurus and semantic network, and a small set of heuristics bas...
متن کاملAn Trial Report to NTCIR10 MedNLP: Extracting Medical Diagnostic Term by Machine Learning
This paper explains our approach toward NTCIR10-MEDNLP[1] tasks and what kind of problem we have encountered. We have select term extraction tasks since we have some experience about keyword extraction[2]. Since it is hard to build accurate dictionary or lexicon for medical term, we aimed to use machine learning and large amount of roughly tagged medical corpus as learning data. However, we are...
متن کاملkyoto: Kyoto University Baseline at the NTCIR-11 MedNLP-2 Task
Since more electronic records are now used at medical scenes, the importance of technical development for analyzing such electronically provided information has been increasing significantly. This NTCIR-11 MedNLP-2 Task is designed to meet this situation. This task is a shared task that evaluates natural language processing technologies especially on Japanese medical texts. The task has three s...
متن کاملStudy and Recognition of Muslim Sage Abdullah Azdi and His Medical Dictionary Called “Kitāb Al-ma”
This study seeks to identify one of the pioneers of traditional clinical medicine named Abdullah Azdi and his medical dictionary. This research is an analytical study. The focus of the search was on two keywords, Abdullah Azdi and Kitab al-Ma'ma, but the scope of the search included all appropriate terms such as: medicine, Bu Ali Sina, traditional medicine, medical dictionary, ethics, and medic...
متن کامل